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Annotation Guidelines for Arabic Nominal 
Gender, Number, and Rationality 

Nizar Habash and Sarah Alkuhlani 
Columbia University 

The annotation task we define here is focused on information relevant to 
modeling Arabic nominal gender and number computationally. First we define the 
various facts regarding number and gender in Modern Standard Arabic and then 
we present the task guidelines and examples. 



I. Arabic Gender and Number Facts 



1. Arabic nouns inflect for gender (masculine/feminine) and number 
(singular/dual/plural). We will not address the dual in this task. 

2. We distinguish two types of gender/number: form-based gender/number 
(J&A) and functional (logical) gender/number (lsj^). For many nouns, the 
form-based and functional values are the same but not always. For 
example, m 2 ^ and are both masculine singular functionally although 

has a feminine ending. Similarly, and lU^ (as in pregnant) are 
both feminine singular, although looks like a masculine noun. Other 
examples include words like <^ and *i jjj, which are both masculine plural 
functionally, but have feminine singular forms. We are concerned only with 
the functional gender/number (y;>~J! -s^lv^iiaJ!). 

3. Arabic adjectives agree with the nouns they modify in gender and number 
EXCEPT for plural irrational (non-human, j^) nouns, which always 
take feminine singular adjectives. 

^j&frA < ■ liSl a a UJ"& a ujIjS 

This does not mean that the adjective h« is plural. It only means that it 
can be used to modify some plural nouns. Note that there are nouns that 
are semantically rational/human but morphologically not: m*-^ 
nation/people mj»^). 

4. Number quantification in Arabic has many complex rules. We focus on a 
couple of rules that interact with gender and number in interesting ways. 

a. Numbers 3-10 always take a plural noun: JUja^Aj. 

b. Numbers over 1 always take a singular noun: 
3U j j J?. j <jU > etc. 

c. Numbers 3-10 have masculine and feminine forms: "<^> I 
Masculine numbers are used with nouns whose singular form is 



feminine and feminine numbers are used with nouns whose 
singular form is masculine. For example: 3^j o**S 
cjXijo^, but p!^^^. (singular is 3=—, which is masculine). 

5. Some nouns in Arabic are semantically plural but morphologically singular. 
They do not agree morphologically like plurals: 3a If 3^ was plural, it 
should take a feminine singular adjective since it is irrational: 3^, 
which is incorrect. Words like 3^ are mass nouns that have a singular 
form related to it: 

6. Some nouns, which are often thought of as collective, are plural 
morphologically: m jp (uj-*^ 1 m j*- 11 , not m j*- 11 ). 



II. Task Guidelines 

The task is to annotate examples of Arabic nouns with four features: functional 
gender, functional number, and rationality. 

a. Functional gender can be M (masculine), F (feminine), B (both), or U 
(unknown). 

b. Functional number can be S (singular), D (dual), P (plural), B 1 (S, D or P) 
or U (unknown) 

c. Rationality can be R (rational), I (irrational), N (not marked), U (unknown) 

The unknown value is only used when the annotator is not sure what the 
correct answer is, e.g., the annotator does not understand the word, or it is 
not clear how to apply the tests for different decisions. These cases will be 
checked later by a supervisor. 

Entries in the lexicon you will be annotating will look as follows: 



### 


U3& 




noun 


author, writer 


### 






noun 


author, writer 


### 






noun 


authors, writers 


### 






noun 


authors, writers 


### 






noun 


authors, writers 



The entries are automatically clustered to bring together related forms. The first 
column is the label that you need to modify. The second is the word form of 
interest. The word is only cited in the nominative form (£A>- H) and with no 
definite article (<-4o*^ J). This does not mean exclude other forms. Think of Cjj^, 
for example, as representing ,ca^ etc. The third is the basic 

lemma/vocable (^>Ji). The fourth is the part-of-speech. The last column is the 



Value B for number is only used for a small set of closed classes. See section (IV). 



English translation. After finishing the annotation, the label column would look 
like this: 



MSR 






noun 


author, writer 


FSR 






noun 


author, writer 


MPR 






noun 


authors, writers 


FPR 


cms 




noun 


authors, writers 


MPR 






noun 


authors, writers 



Next are some tests for determining the correct value for each feature. 

a. Functional number 

i. If the word can be quantified by a number (3-10) and/or modified by 
a plural adjective, it is plural, e.g., 

else the word is singular. 

ii. Some words might look like a plural but they are not, e.g., <^ijjA 
This word is a singular adjective. Its plural form is uji^j^ 

iii. If you are not comfortable making a decision, choose U (unknown). 

b. Functional gender 

i. The gender of a singular word is the same as the gender of the 
adjective that can modify it, e.g., J^j, s^, <_i&* 

ii. Nouns that can be both masculine and feminine are marked as B 
(both), e.g., ^}+^> (^ ^i), aJj^JjjL (jjjk 

iii. The gender of a plural word is the same as the gender of its 
singular . So, we turn the plural to a singular first to determine 
gender: -> -> M; cjUUU -> -> M; etc. 

iv. Some nouns are the plural form of more than one singular (with 
different genders): e.g., ciiii^i is plural of (F) and of li 1 ^ 1 (M); 

is plural of (B) and of '^kj^ (F) (each of which has an 
additional unique plural - ciijL and &\J*, respectively). In such 
cases, assign the value B. 

v. If you are not comfortable making a decision, choose U (unknown). 

c. Rationality 

i. If the adjective of the plural noun is feminine singular, the noun is 
irrational, e.g., ^^cjUUi^i. 

ii. The rationality of singular nouns is determined by turning them into 
the plural first: ^ -> <J2& : "» jM> -> Irrational. 

iii. Adjectives take the value N (not marked). 



Words such as J*i have different contexts: 6^ <_uti£ll [FPI] / 

jjLu [MSI] / uj^ sr"^l [MPR]. 

Here is our view: Arabic allows a lot of elision; in fact the constructions 
above are: 

Oki [FPI] 

Okj [MSI] 

jjJxj <_uti£l! (_jjp. j| jii [MPR] 

So, we will go with the simplest reading when multiple readings can be 
used. So, is FSI 

Ajjjm«J! ; ajjj (FSI ; FPI) but ^ J\ 

iv. There are some cases with lemma ambiguity. For example, uj^» can 
refer to the hotel chain or a member of the Hilton family. For these 
cases, go with the most common reading for that word, 
i. If you are not comfortable making a decision, choose U (unknown). 

d. Errors 

i. The lexicon you will be annotating may contain some errors. 

ii. For missing English translations, add the translation preceded by 
"ADD:" in the English column. For incorrect translations, write "DEL:" 
just before the word to remove. 

iii. For missing entries, add the entry by copying the full line from one of 
the existing forms first and then modifying it. Add the sequence 
"ADD:" at the beginning of your label. 

iv. For wrong entries, place "ERR" in the label. 

Here is an example: let's pretend the entry you got is this: 



### 






noun 


author 


### 






noun 


author 


### 






noun 


dancers 


### 


cms 




noun 


dancers 


### 


AJ1\ j£ 




noun 


dancers 



Here is how you may correct it 



MSR 




U3& 


noun 


author, ADD:writer 


FSR 






noun 


author, ADD:writer 


MPR 






noun 


DEL:dancers, ADD:authors, ADD:writers 


FPR 






noun 


DELdancers, ADD:authors, ADD:writers 


ERR 


AjjI j£ 




noun 


dancers 


ADD:MPR 






noun 


authors, writers 



III. Examples 

G-N-R = gender-number-rationality 





tngnsn 


m d 
U-INI-K 


, .-I < 


Author/writer (male) 


MbR 


1 


Author/writer (female) 


rr r> 

rbK 


UJf " 


Authors/writers (male) 


MrK 




Authors/ writers (female) 


rnn 

rrK 


LJUl 


Authors/writers (male) 


MOD 

MrK 




Autnors/ writers ^maiej 


M DD 

rlrK 




uentieman /Mister 


M C D 

l v lbK 




Lady 


rr n 

rbK 


dll llLU 


Ladies 


rrK 


- vl 


Gentlemen 


MrK 


1 1 


Carrying (masc.sing.) 


NbIM 


t 1 


pregnant (sing.; 


cr d 

rbK j 




pregnant ^piur.j 


rnn 
rrK 


a ; \ ■ 

d Q U-v 


Laiipn 


MbR 




Caliphs 


y nn 

MrR 




office 


Nbl 




offices 


M DT 

Nrl 


• 1 •: 1 


exam 


Nbl 


I lljl -s 1^1 


exams 


M DT 

l v l r 1 j 




story 


rbl 




stories 


CDT 

rrl 




story 


rbl 





stories 


CDT 

rrl 




Ants (uncountable) 


M CT 

Nbl 




i_erernoniai ^rnasc.sing. j 


MCM 




Ceremonial (fem.plur.) 


FPN j 




army 


MSI 




armies 


MPI j 




A palm date 


FSI 




Some palm dates 


FPI 




Palm dates 


MSI 




Types of palm dates 


MPI 


JaS 


Extended family 


MBR 




Extended families 


MPR 




Arabs 


MPR 







X 


X 


XJl 


X 


X 


X 


X 


X 


X 






-S-N 


-S-N 


-P— 


FP-N 


MP-N 


MS-- 


FS- 
-Pl- 


MPR- 


FPR- 




IVIoK 


v 

Y 


v 

Y 








v 

Y 










pop 


v 

Y 


v 

Y 










v 

Y 














v 

Y 




v 

Y 






v 

Y 






PPR 






v 

Y 


v 

Y 










V 
Y 










v 

Y 




V 
Y 






V 
Y 






roi 


v 

Y 


v 

Y 










V 
Y 






oXaj 


PPI 

rrl 






V 
Y 


V 
Y 






V 
Y 








MQI 
IVIol 






V 
Y 






V 
Y 










mqi 

IVIol 


v 

Y 


V 
Y 








V 
Y 








SI a *.".! 


EQI 

rol 


V 
Y 


V 
Y 










V 
Y 






SIS '.',1 
> — a ui 11 


DDI 

Drl 






V 
Y 


V 
Y 


V 
Y 




V 
Y 








yen 
IVIoK 


v 

Y 


V 
Y 








V 
Y 








, IK , 


Mr K 






V 
Y 




V 
Y 






V 
Y 




' — 1 * . J 


mpm 

IVI r IVI 






V 
Y 




V 
Y 






V 
Y 






mrm 

IVI D IVI 


v 

Y 


v 

Y 








V 
Y 




V 
Y 






MPM 
IVI r IVI 






V 
Y 




V 
Y 




V 
Y 


V 
Y 






F^l 

1 1 


V 

1 


V 

1 










Y 






CjI j-aj 


FPI 






Y 


Y 






Y 






J"" 


MSI 












Y 










MPI 






Y 




Y 




Y 








MSI 












Y 










FSI 


Y 


Y 










Y 








FPI 






Y 


Y 






Y 






*u 


MSI 












Y 










MPI 














Y 







IV. Closed Classes 



Closed classes include verbs, numbers, digits, pronouns, and quantifiers. We 
will discuss each class separately 

a- Verbs: 

Gender and number functional features match their form-based gender 
and number features. Therefore, verbs are annotated automatically by 
assigning them their form-based gender and number features. Rationality 
feature for verbs is N. If the verb is 1 st person, functional gender will be B 
since the verb could refer to either M or F. 

b- Digits: are also annotated automatically as follows: 

=> BBN 

1 => BSN 

2 => BDN 

Other digits (e.g., 21 , 482 ) => BPN 
Decimal numbers (e.g.,1 .2, 0.5) => BBN 

c- Numbers: 

Each Number is annotated similarly to digits but with a specific gender 
value, either F or M depending on its form. However, when a number is 
an adjective, it behaves differently and does not follow normal noun 
adjective agreement rules. Such numbers were given the gender and 
number value B to prevent any inconsistency with the way adjectives 
agree with nouns. 
Ex: 

kujVi ju. J\ 

Now, lets look at the following example: 

We believe that owjVi should have two different lemmas. Due to this 
limitation in our resourse, we will overcome this by giving the word the 
value B for both gender and number. 



d- Pronouns: 

Pronouns were annotated on a case by case basis based on its core 
semantic meaning. If the pronoun is 1st person pronoun, e.g., for the 
pronoun Lsi, the functional gender will be B since the pronoun could refer to 
either M or F. 



e- Quantifiers: 

Quantifiers such as l£ , , <-a^" can be modified by a singular, dual or 
plural, feminine or masculine word. 



Although in the first two examples, l£ has a different meaning (it means 
"each" in the first example and "all" in the second example), but our 
resource does not distinguish between them and give them the same 
lemma. Due to this limitation, quantifiers were given the gender value B 
and number value B to include both cases. 

f- Comparative adjectives: 

Comparative adjectives such as Jji, were given the gender value B 
and number value B. The reason behind this is that it can be modified by a 
singular, dual or plural, feminine or masculine word. 

Some comparative adjectives have a feminine form and can only be 
modified by feminine words such as Jj'- These were given the 
gender value F. The number value will still be B since the word can modify 
or be modified be a singular, dual or plural word. 



